Tetrapod sperm length evolution in relation to body mass is shaped by multiple trade-offs

Sperm length is highly variable across species and many questions about its variation remain open. Although variation in body mass may affect sperm length evolution through its influence on multiple factors, the extent to which sperm length variation is linked to body mass remains elusive. Here, we use the Pareto multi-task evolution framework to investigate the relationship between sperm length and body mass across tetrapods. We find that tetrapods occupy a triangular Pareto front, indicating that trade-offs shape the evolution of sperm length in relation to body mass. By exploring the factors predicted to influence sperm length evolution, we find that sperm length evolution is mainly driven by sperm competition and clutch size, rather than by genome size. Moreover, the triangular Pareto front is maintained within endotherms, internal fertilizers, mammals and birds, suggesting similar evolutionary trade-offs within tetrapods. Finally, we demonstrate that the Pareto front is robust to phylogenetic dependencies and finite sampling bias. Our findings provide insights into the evolutionary mechanisms driving interspecific sperm length variation and highlight the importance of considering multiple trade-offs in optimizing reproductive traits.


Data collection
All data and the associated references are reported in the dataset uploaded in Figshare (https://figshare.com/articles/dataset/Dataset_Tetrapod_sperm_length_evolution_in_relation_to_body_mass_is_shaped_by_multiple_trade-offs_/26022289).During data collection, we prioritized data from recent sources that contained data for multiple species, and supplementing data for individual species when possible.To avoid conflicts among datasets, we followed a standardized protocol when collecting data.Firstly, we preferred sources that reported measures from n>1 individual and we thus excluded values measured from a single individual or from dead animals.We considered only mean values instead of maximal values.Then we preferred to not include values collected using data extraction software from images and values collected without a described and standardized methodology (e.g., "personal observations").If multiple sources fit the above criteria, we prioritized more recent values.
Where multiple data remained, we prioritized the dataset with the largest sample size.In a few cases we spotted clear errors in the values reported in the most recent study and we thus decided to report the value contained in the original reference.All the species names were uniformed to the most recent nomenclature or in the case of equivalent synonymous, the most used were chosen.For Amphibia, we limited our analysis to anurans species, thus excluding two orders: Urodela and Gymnophiona.While we did not find data relative to Gymnophiona species, we decided to exclude Urodela (salamanders) for two reasons: 1) on average their sperm are extremely long and significantly longer than the average sperm size of tetrapods 2 and 2) we did not found data on clutch size and testes mass for the salamanders in our dataset.Using salamander species for exploring sperm size-body mass morphospace would have extended the distribution of sperm size without the possibility to perform enrichment analysis on those phenotypes given the absence of clutch size and testes mass data.We also excluded an outlier for sperm size: an anuran species (Discoglossus pictus) in which males produce the longest vertebrate sperm measured (2.5 mm) 2 .It is worth noting, however, that the exclusion of Discoglossus pictus and Urodela species did not significantly affect the shape of the morphospace given by the sperm size and body mass.

Relative testes size as an index of the level of sperm competition
Testes mass has been shown to closely reflect the level of sperm competition among species and, within species, between males with alternative mating strategies 3 .Since testes mass is also strongly correlated with body mass both in our dataset (Supplementary Fig. 3) and in other published studies (e.g.see Fig. 1 in 4 ), residuals of the log-log linear regression of testes mass on body mass are routinely used to estimate whether the testes of a species are larger (reflecting high levels of sperm competition) or smaller (reflecting lower levels of sperm competition) than expected for the body size in comparative studies.Citing 3 : "This prediction [i.e. that testes size reflects sperm competition level] is so widespread that testes size (correcting for body size) is commonly used as a proxy of sperm competition, even in the absence of any other information about a species' reproductive behaviour".We therefore used the same approach to provide results that are comparable with previous research in the field.Since tetrapod classes have significantly different regression slopes 4 (Supplementary Fig. 3), we calculated body masscorrected testes size (hereafter RTS) from class-specific regression lines (Supplementary Table 1).
Supplementary Figure 3. Relationship of body mass with testes mass.Each class of tetrapods are plotted in the log-log space of body mass and testes mass.Amphibia n=197, Mammalia n=408, Aves n=324, Reptilia n=29.
Supplementary Table 2. Correlation between RTS calculated on the entire dataset and after the exclusion of the outliers.

Robustness of the Pareto front: control for phylogenetic bias between 65 Mya and the present day
Our control for phylogenetic biases indicates that the triangular-shaped Pareto front remained robust to phylogenetic dependencies until approximately 65 Mya, at which point the p-value exceeded the 0.05 threshold (Fig. 5d).To investigate the loss of significance at around 65 Mya, we hypothesized that the SibSwap randomization approach might become overly conservative as we approach the terminal nodes because each sibling tip contains less and less species as we get closer to the present day.In such case, the SibSwap-shuffled distributions could lead to nonsignificant outcomes when compared to the original Pareto front (p-values > 0.05), because the actual distributions of species might closely resemble those of the shuffled distributions.We therefore reasoned that in such scenarios, a potential solution might be to consider the species within each sibling tip as an independent data point in the trait space by calculating the average value.The distribution of the obtained average values can be then tested within a triangular region in the trait space, as predicted by the Pareto theory.Importantly, the randomized datasets would now shuffle traits across all average values, which are treated as independent data points in the trait space.First, the number of sibling tips ranged from 68 to 922 as we approached the terminal nodes of the tree (Supplementary Fig. 5a).We then checked the marginal distributions of the traits in the sibling tips (Supplementary Fig. 5b) and we noticed that the marginal distributions of the species within the sibling tips resembled those of the original dataset.Subsequently, we calculated the centroids of each sibling tip by taking the averages for each trait across species within the sibling tip and plotted the centroids in the trait space.When evaluating the triangularity of the distributions as a function of time points, we found that the p-values were mostly significant, indicating that the average values generated a continuously filled triangular distribution in trait space (Supplementary Fig. 5c).In